Broad coverage paragraph segmentation across languages and domains
نویسندگان
چکیده
منابع مشابه
Automatic Paragraph Identification: A Study across Languages and Domains
In this paper we investigate whether paragraphs can be identified automatically in different languages and domains. We propose a machine learning approach which exploits textual and discourse cues and we assess how well humans perform on this task. Our best models achieve an accuracy that is significantly higher than the best baseline and, for most data sets, comes to within 6% of human perform...
متن کاملComprehension across Application Domains and Languages
This work demonstrates that our natural language understanding framework can be applied across application domains and languages with ease. Approaches towards language understanding generally involve much handcrafting, e.g. in writing grammars or annotating corpora, hence portability is a desirable trait in the development of language understanding systems. Our framework for natural language un...
متن کاملMulti - Paragraph Segmentation of ExpositoryTextsMarti
We present a method for partitioning expository texts into coherent multi-paragraph units which reeect the subtopic structure of the texts. Using Chafe's Flow Model of discourse, we observe that subtopics are often expressed by the interaction of multiple simultaneous themes. We describe two fully-implemented algorithms that use only term repetition information to determine the extents of the s...
متن کاملBroad Coverage Automatic Morphological Segmentation of German Words
A system for the automatic segmentation of German words into morphs was developed. The main linguistic knowledge sources used by the system are a word syntax and a morph dictionary. The syntax is written in the formalism of right linear regular grammars and comprises approximately 1,400 rules describing the set of those sequences of morph classes which underlie syntactically well formed words. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Speech and Language Processing
سال: 2006
ISSN: 1550-4875,1550-4883
DOI: 10.1145/1149290.1151098